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The  present  study  provided  the  first  experimental  application  of 
continuous  and  direct  recording  of  operant  methodology  to  the  clinical 
judgment  process.  This  novel  application  attempted  to  provide  initial 
answers  to  four  questions:  1)  How  stable  are  the  daily  predictions 
made  by  judges?  2)  How  does  the  time  involved  in  making  a  clinical 
judgment  influence  accuracy?  3)  What  is  the  effect  of  an  increase  in 
available  information  on  clinical  prediction?  4)  What  is  the  effect 
of  experience  level  on  clinical  judgment? 
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Six  judges,  two  first  year  graduate  students   in  Psychology 
(F  -  1;   F  -  2),  two  Psychology  practicum  students   (P  -  1;  P  -  2), 
and  two  interns   (I  -  1 ;   1-2)  were  asked  to  make  daily  discriminations 
between  test  protocols  belonging  to  men  convicted  of  first  or  second 
degree  murder  and  those  belonging  to  men  convicted  of  crimes  against 
property.     Levels  of  information  were  increased  in  four  phases: 
Experiment  1   -  MMPI's  only;  Phase  1   -  New  set  of  MMPI's;  Phase  2  - 
MMPI's  and  Rorschachs;  Phase  3  -  MMPI's  and  Rorschachs  and  Biographi- 
cal  Data;  Phase  4  -  MMPI's  and  Rorschachs  and  Biographical   Data  and 
Formulas. 

The  frequency  correct  and  frequency  incorrect  of  the  daily 
discriminations  were  plotted  for  each  judge  on  a  Standard  Behavior 
Chart.     The  results  were  described  in  terms  of  how  each  new  phase 
change  in  the  experiment  affected  the  daily  clinical  judgment  of  each 
judge.     These  effects  were  discussed  in  terms  of  accuracy,  efficiency, 
step,  growth  and  total  bounce. 

The  results  demonstrated  a  number  of  different  effects.     In  some 
cases,  the  effects  were  consistent  with  previous  research;  for  example, 
the  overall   low  accuracy  level   for  most  judges.     However,  some  of  the 
findings  we^e  unanticipated,  particularly  the  overall  negligible 
effect  of  adding  information  to  the  clinical   judgment  process.     The 
results  also  demonstrated  a  number  of  new  effects.     These  were:     1) 
The  direct  and  systematic  replication,  across  and  within  judges,  of 
the  stability  of  the  daily  predictions  across  phases;  2)  The  repli- 
cation of  the  lack  of  essential   differences  between  celeration 
frequencies  and  celeration  record  floor.     These  findings  provide  a 
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starting  point  for  future  applications  of  operant  methodology  to 
the  study  of  the  clinical  judgment  process. 


INTRODUCTION 

The  major  emphasis  in  the  field  of  psychological  assessment  has 
recently  turned  away  from  construction  and  validation  of  tests  toward 
a  fuller  consideration  of  other  factors  in  the  assessment  situation. 
A  large  number  of  studies  have  been  carried  out  focusing  upon  various 
aspects  of  the  complex  process  of  clinical  prediction.  Some  studies 
have  been  concerned  with  issues  of  theoretical  importance  and  others 
have  also  dealt  with  more  practical  contributions. 

Clinical   versus  statistical   prediction.     —One  of  the  main  areas 
of  research  since  fleehl's  (1954)  influential  book  has  been  to  compare 
clinicians  to  statistical   formulae.     Meehl    (1954)  cited  20  relevant 
studies  comparing  actuarial  with  clinical  methods  and  found  that  in  all 
but  one  of  these,  actuarial  methods  consistently  equaled  or  surpassed 
clinicians  in  accuracy  of  predicting  a  criterion.     This  finding 
provided  a  force  potentially  capable  of  crumbling  the  diagnostic 
foundations  of  clinical  psychology  as  it  is  practiced  today.     In  a 
later  paper  Meehl    (1956)  emphasized  that,  while  the  clinician  has 
some  useful  and  unique  talents,  clinical   prediction  is  not  one  of 
them,  and  he  concluded  that  for  many  diagnostic  problems  the  clinician 
is  a  costly  middleman  who  could  be  more  productive  doing  research  and 
therapy.     Interest  in  the  clinical   versus  actuarial   issue  continued, 
presumably  in  response  to  Meehl 's  challenge  that  those  who  found 
his  "box  score"  disturbing  should  publish  research  which  refutes  it 
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(Meehl ,   1969).     Meehl    (1965)   recently  reviewed  the  relevant  research 
literature  published  since  1954,  and  concluded  again  that  two-thirds 
of  the  fifty  studies  showed  a  statistically  significant  superiority  of 
actuarial   methods. 

Holt  (1958)  criticized  Meehl 's  conclusions  on  the  basis  that 
several  stages  are  involved  in  clinical  prediction  and  that  Meehl's 
comparison  focused  only  on  one  of  these.     He  argued  that  the  cross- 
validation  possible  with  actuarial   techniques  is  not  possible  with 
clinical   techniques,  and  that,   therefore,   the  two  methods  are  not 
logically  comparable.     Holt  (1958,   1970)  states  that  the  evidence  in 
favor  of  actuarial   methods  may  be  a  function  of  the  experimental 
design,  which  puts  the  clinician  at  a  disadvantage,   rather  than  the 
actual   superiority  of  statistical  prediction.     An  additional  problem  is 
that  the  clinician  has  seldom  been  given  the  opportunity  to  incorporate 
the  actuarial   information  in  formulating  his  final   decision. 

Available  information.     —The  information  available  to  the  clini- 
cian often  has  been  based  on  non-quantitative  data  such  as   interview 
material,  case  history  data,  and  projective  tests.     Goldberg  (1968)  has 
shown  that  in  such  situations,  the  clinician  has  been  inferior  to  the 
actuarial   methods  and  his  judgmental  accuracy  has  decreased  both  with 
increased  levels  of  test  information  and  clinical   experience.     Shagoury 
and  Satz  (1969)  examined  the  effects  of  levels  of  quantitative  infor- 
mation on  judgmental   accuracy  in  a  clinical   statistical   decision-making 
task  (brain  damage  versus  normal   protocols).     Judges  were  provided  with 
increasing  increments  of  statistical   information.     The  results  showed 
that  judgmental   accuracy  increased  substantially  with  increased  levels 
of  information.     The  increase  in  judgmental   accuracy  was  also  shown  to 


vary  with  different  strategies  which  the  judges  utilized  during  the 
experiment. 

Moxley  and  Satz  (1970)  asked  judges  to  make  postdictive  judgments 
on  the  length  of  stay  in  psychotherapy  (short  or  long)   for  a  sample  of 
mental  health  service  clients.     Judgments  were  made  under  four  con- 
ditions in  which  tests  and  statistical   information  increased  incre- 
mentally at  each  level.     It  was  found  that  accuracy  increased  over 
levels  of  information.     Moxley   (1970)   states  that  if  "the  clinician 
is  able  to  incorporate  the  statistical   information  he  may  equal  or 
surpass  the  accuracy  of  actuarial  methods." 

Effect  of  level  of  training.     --Recent  studies  focusing  on  sta- 
tistical and  clinical  prediction  have  also  dealt  with  the  question 
of  experience.     The  evidence  indicates,  rather  surprisingly,  that  as 
experience  increases,  prediction  accuracy  decreases.     In  his  review 
of  the  qualities  which  contribute  to  the  ability  to  judge  people, 
Taft  (1955)  reported  that  persons  without  clinical   training,  such  as 
physical  scientists  and  experimental  psychologists,  are  more  accurate 
judges  of  people  than  are  clinical   psychologists  or  clinical   psychology 
students.     Goldberg  (1959)  showed  that  trained  clinicians  were  not 
superior  to  graduate  students  or  secretaries  in  predicting  brain 
damage  from  Bender-Gestalt  protocols.     Sarbin,  Taft,  and  Bailey  (1960) 
brought  the  literature  up  to  date  when  they  reviewed  fourteen  addition- 
al similar  studies.     Most  found  no  difference  in  accuracy  between 
clinicians  and  other  judges;  the  rest  were  equally  distributed  in 
finding  the  clinician  either  superior  or  inferior.     Grebstein   (1963) 
found  no  significant  differences  in  accuracy  in  naive,  semi-sophisti- 
cated, and  sophisticated  judges.     In  a  study  using  judgments  of 


psychosis  or  neurosis   from  MMPI   data,  Goldberg  (1965)   found  that  staff 
judges  and  trainees  achieved  the  same  accuracy  on  the  average,  that 
the  four  best  and  two  worst  judges  were  trainees,  and  that  there  was 
wide  variability  on  each  diagnostician's  performance  over  different 
samples.      In  a  more  recent  study,  Perez  and  Satz  (1971)   found  that 
there  were  slight  differences  in  accuracy  between  graduate  students 
in  clinical   psychology  and  clinicians   in  predicting  length  of  stay  in 
therapy  from  MMPI  profiles.     There  was  a  higher  overall   hit  rate  for 
graduate  students  than  clinicians.     Taft  (1955)  advances  the  explana- 
tion that  trained  professionals,  somewhere  in  their  training,  acquire 
a  "set"  which  interferes  with  making  unbiased  objective  decisions. 

The  process  of  clinical   thinking:     cognitive  models.     —Among  the 
researchers  who  have  attempted  to  study  and  describe  the  workings  of 
the  clinician's  "mind"  there  seem  to  be  two  camps.     There  are  those 
who  hold  that  clinical   thinking  is  a  mystical,  intuitive,  and  thus 
inexplicable  process.     There  are  others  who  theorize  that  the  clini- 
cian's thoughts  are  orderly,  logical,  definable,  and  even,  in  some 
cases,  deserving  of  mathematical  models. 

Among  the  former  is  Luft  (1950)  who  has  described  the  process  of 
clinical  judgment  as  a  sifting,  screening,  and  synthesizing  of  case 
materials   in  "some  intangible  way."     Another  is  Mann   (1956)  who  in  his 
review  of  Meehl's   (1954)  book  emphasized  the  complexity  of  human 
decision-making  and  the  importance  of  considering  all   the  factors 
involved  in  judgments. 

Perhaps  one  of  the  earliest  major  efforts  to  demonstrate  that  the 
diagnostic  process  is  capable  of  truly  rigorous  investigation  was 
Hoffman's   (1960)  study  in  which  he  reduced  the  diagnostic  process   to 


5 
mathematical  models.     He  proposed  both  linear  and  configural  models, 
and  suggested  that  a  fruitful  approach  to  research  would  involve 
focusing  upon  the  individual  as  the  unit  of  research,  and  studying 
his  behavior  as  it  relates  to  each  of  these  mathematical  models. 
In  one  analysis  of  components  of  clinical    inference  Hammond,  Hursch, 
and  Todd  (1964)  applied  a  multiple  regression  technique  to  Brunswik's 
lens  model.     The  statistical   components  derived  were  used  to  examine 
several  types  of  previous  studies.     Judges  were  found  generally  to 
combine  cues   linearly,  but  the  authors  argued  that  the  simple  rating 
tasks  studied  are  conducive  to  a  linear  response  system.     They 
suggested  that,  for  more  complex  judgments,  the  lens  model  of  analysis 
can  prove  useful   in  studying  human  cognitive  processes. 

Studying  the  way  in  which  the  clinician  processes  data,  Wiggins 
and  Hoffman  (1968)  devised  three  statistical  models,  and  compared  them 
with  clinical  judgments  of  psychosis  and  neurosis  from  MMPI  profiles. 
A  linear,  a  quadratic,  and  a  sign  model  were  used.     Results  indicated 
that  the  sign  model  best  described  13  judges,  the  quadratic  model   3, 
and  the  linear  model   12.     There  was  no  significant  relationship  between 
configural   or  linear  style  and  amount  of  clinical   training,  or  between 
style  and  accuracy. 

In  an  effort  to  study  the  complexity  of  the  process  of  clinical 
judgment,  Oskamp  (1967)  utilized  both  Hoffman's  multiple  regression 
procedures  and  an  analysis  of  validity  coefficients  to  investigate 
clinical   judgments  from  MMPI  profiles.     Judges  made  a  dichotomous 
decision  of  whether  the  patient  was  hospitalized  for  medical  or 
psychiatric  reasons.     Oskamp  found  that  the  conclusion  drawn  regard- 
ing the  complexity  of  the  process  was  dependent  upon  the  type  of 


analysis  used  to  study  it.  Multiple  regression  analysis  suggested  that 
the  judgment  process  is  simple,  and  that  judges  used  predictor  variables 
in  a  linear  additive  way.  The  validity  coefficient  analysis  suggested 
that  judgments  are  made  in  a  complex,  configural  way,  which  agreed  with 
the  judges'  subjective  impressions  of  the  way  in  which  they  utilized 
the  data. 

Feedback  and  clinical  prediction.  --Given  the  discouraging  in- 
formation that  clinical  judgments  are  often  inaccurate,  a  number  of 
researchers  have  directed  their  attention  to  the  use  of  feedback  as  a 
device  to  improve  accuracy.  Feedback  always  involves  giving  the  subject 
some  information  about  his  performance  in  successive  trials.  Bilodeau 
and  Bilodeau  (1961)  say,  "Studies  of  feedback  or  knowledge  of  results 
(KR)  show  it  to  be  the  strongest,  most  important  variable  controlling 
performance  and  learning."  Ammons  (1956),  in  surveying  the  effects  of 
knowledge  of  performance,  concluded  that  KR  almost  universally  results 
in  more  rapid  learning  and  a  higher  level  of  performance. 

There  have  been  many  ideas  advanced  about  the  role  of  feedback  in 
terms  of  how  it  may  influence  work,  learning,  and  performance  (e.g., 
Ammons,  1956).  Sechrest,  Gallimore,  and  Hersh  (1967)  present  two 
hypotheses:  a)  that  feedback  operates  by  providing  information  by 
means  of  which  the  subject  can  adjust  his  implicit  hypotheses;  or  b) 
that  feedback  serves  as  a  motivational  function  by  convincing  and  re- 
minding the  subject  that  the  task  is  one  in  which  improvement  is  expect- 
ed and  possible.  According  to  Underwood  (1966),  the  most  dramatic 
effects  of  KR  can  be  shown  for  tasks  in  which  the  precision  of  the 
response  is  initially  very  poor,  and  for  which  the  subject  can  give 
himself  at  best  minimal  feedback. 


Sechrest,  Gallimore,  and  Hersh  (1967)  devised  three  experiments  to 
provide  feedback  on  predictive  accuracy  in  the  expectation  that  feed- 
back could  be  used  to  improve  performance.  Their  study  stemmed  from  a 
recommendation  by  Holt  (1958)  that  clinicians  should  have  training 
which  makes  it  possible  for  them  to  validate  themselves  as  predictors 
in  much  the  same  way  as  tests  are  cross-validated.  The  "clinicians" 
studied  were  undergraduate  students,  and  the  prediction  task  involved 
interpretation  of  short  sentence  completion  protocols.  They  found 
evidence  in  all  three  experiments  for  the  superior  performance  of  those 
subjects  who  received  feedback,  but  the  bulk  of  the  evidence  suggested 
that  the  feedback  effect  was  attributable  to  enhancement  of  motivation 
of  the  subjects,  rather  than  to  specific  informational  value. 

Rotter  (1967)  points  out,  however,  that  the  Sechrest,  Gallimore, 
and  Hersh  study  had  a  number  of  design  limitations.  These  are:  a)  the 
subjects  were  undergraduates  with  little  experience  and  possibly  low 
motivation;  b)  test  data  (10  ISB  responses)  may  have  been  too  small  for 
a  valid  judgment;  c)  while  knowledge  of  how  the  criterion  was  determined 
was  given  to  the  subjects,  this  knowledge  could  only  have  provided  a 
minimum  of  information  to  the  subjects;  d)  feedback  was  given  with  an 
overall  judgment  for  correct  or  incorrect  without  knowing  whether  or  not 
the  subjects  had  made  their  hypotheses  explicit,  or  whether  or  not  the 
hypotheses  the  subjects  were  relying  upon  were  relevant  or  irrelevant. 

Watley  (1968)  studied  the  effect  of  providing  immediate  feedback 
training  to  judges  known  from  a  previous  study  (Watley,  1966)  to  pre- 
dict educational  criteria  at  relatively  high,  moderate,  or  low  levels 
of  accuracy.  The  criteria  predicted  were  freshman  and  overall  college 
grades.  In  comparison  to  judges  who  received  no  training,  the  forecast 


of  low-accuracy  judges  showed  substantial   improvements  for  both  pre- 
dicted criteria:   however,   the  training  had  no  noticeable  effect  on  the 
judgments  of  the  high  or  moderate-accuracy  judges. 

Perez  (1970)  studied  the  effects  of  feedback  on  a  problem  that 
clinicians  face  in  their  practice  --  that  of  predicting  length  of  stay 
in  psychotherapy.     Sixteen  judges,  eight  clinical   psychologists  and 
eight  graduate  students   in  clinical   psychology  were  asked  to  predict 
length  of  stay  in  psychotherapy  from  MMPI  profiles.     Judges  were 
randomly  divided  into  a  feedback  condition  and  a  no-feedback  condition. 
It  was  hypothesized  that  judges   in  the  feedback  condition  would  do 
significantly  better  than  judges   in  the  no-feedback  condition.     The 
results,  while  in  the  predicted  direction,  were  not  significant, 
largely  because  of  the  high  level   of  initial   accuracy  on  this  clinically 
relevant  task.     In  agreement  with  Watley's   (1967)   finding,   inspection 
of  the  performance  of  each  judge  individually  revealed  that  feedback 
was  most  beneficial   for  those  judges  starting  at  a  low  accuracy  level. 

The  clinician  behaving.     --Clinical   psychologists  have  produced  a 
widely  varied  number  of  hypotheses  in  response  to  the  evidence  that 
they  have  not  yet  demonstrated  their  diagnostic  prowess. 

Hunt,  Wittson,  and  Hunt  (1953)  have  suggested  that  the  confusion 
may  result  not  only  from  lack  of  ability  in  the  diagnostician,  but  also 
in  poorly  delineated  diagnostic  categories  and  the  professional  customs 
which  permit  careless  diagnosis  or  inaccurate  diagnosis   for  administra- 
tive reasons.     Some   (Holt,  1958;  Sawyer,  1965;  and  Taft,   1959)  have 
proposed  that  the  available  research  comparisons  between  clinical  and 
statistical   methods  are  essentially  not  parallel  and  are  therefore 
meaningless.     Others   (Hunt,  Arnhoff,  and  Cotton,   1954;  Hunt  and  Jones, 


9 
1962)  have  pointed  out  that  the  application  of  formal  scoring,  mathe- 
matical models,  and  certain  statistical  treatments  to  clinical  data 
tends  to  distort  research  findings. 

Little  (1967)  and  Rotter  (1967)  have  pointed  to  such  artificial- 
ities in  research  studies  as  the  use  of  undergraduate  judges  to  re- 
present experienced  clinicians,  the  lack  of  adequate  criterion  data, 
the  use  of  inadequate  test  data  for  the  prediction  required,  and  in- 
complete description  of  the  criterion  to  be  predicted.  Some  (Payne, 
1958;  Cole  and  Magnussen,  1966)  have  argued  that  diagnostic  test 
scores  are  useless,  since  the  diagnostic  categories  themselves  are 
not  related  to  symptoms,  etiology,  treatment,  or  prognosis,  and  that 
the  prediction  of  diagnostic  labels  has  much  less  meaning  than  would 
the  prediction  of  the  behavioral  consequences  of  them.  Many  have 
continued  their  attempts  to  develop  better  test  instruments.  Meehl 
(1960),  in  contrast,  has  proposed  that  diagnosis  be  left  to  the 
superior  ComputerLand  that  the  clinician  concentrate  on  research  and 
psychotherapy. 

It  is  presently  argued  that  the  conflicting  results  found  by 
experimenters  in  the  area  of  clinical  prediction  are  due  largely  to 
lack  of  an  adequate  experimental  methodology  to  study  this  complex 
task.  Bachrach  (1965)  states  that  the  goals  of  science  are:  descrip- 
tion, explanation,  prediction,  and  control.  From  the  previous  review 
of  the  literature,  it  is  clear  that  the  research  area  of  clinical 
prediction  is  far  behind  in  reaching  these  goals.  Meehl  (1954)  stated 
that  "Presumably  some  kind  of  longitudinal  study  is  needed  to  find  out 
whether  and  to  what  degree  the  'good'  clinician  is  stably  such,  rather 
than  being  merely  the  momentarily  luckiest  fellow  among  a  crew  of  equal 
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or  near-equal  mediocre  guessers."     No  attempt  has  yet  been  made  to 
provide  an  answer  to  this  challenge.     The  precise  experimental 
methodology  of  free-operant  conditioning,  applied  to  the  continuous 
and  direct  recording  of  the  clinical   judgmental   process,  provides  a 
unique  opportunity  to  study  judges  making  their  predictions  on  a 
longitudinal   basis. 

Experimental   analysis  of  clinical   prediction.     --Since  Sidman's 
(1960)   influential   book,  Tactics  of  Scientific  Research,   psychology 
has  witnessed  many  innovative  applications  of  operant  methodology  to 
traditional   problem  areas   (Ullmann  and  Krasner,   1965;   Ulrich,  Stachnik, 
and  Mabry,  1969,   1970).     A  most  notable  example  is  Lindsley's   (1969) 
application  of  the  continuous  and  direct  measurement  of  operant 
methodology  to  the  study  of  traditional   psychotherapy.     The  underlying 
thesis  of  these  applications  has  been  that  variability  is  not  intrinsic 
to  the  subject  matter  but,  stems   rather  from  discoverable  and  con- 
trollable causes.     Any  sample  of  behavior  is  under  the  control   of  a 
multiplicity  of  variables,  some  of  them  presumably  held  constant  in  a 
given  experiment,  and  others  simply  unrecognized.     Sometimes  the 
variability  in  a  set  of  data  can  be  located  among  such  factors.     Two 
subjects  may  be  found  to  differ  in  their  response  to  variable  A,  not 
because  there  is   intrinsic  variability  in  the  relation  between 
variable  A  and  behavior,  but  because  they  differ  in  their  response  to 
variable  B,  which  interacts  with  variable  A.     The  process  of  tracking 
down  sources  of  variability,  and  thus  explaining  variable  data,  is 
characteristic  of  the  scientific  enterprise  (Sidman,   1960). 
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Sidman  (1960)  believes  that  the  control  of  data  in  research  does 
not  depend  upon  the  amassing  of  large  groups  of  subjects,  or  even 
large  samples  from  an  individual  subject.  He  states  that  "We  must 
consider  our  science  immensurably  enriched  each  time  someone  brings 
another  sample  of  behavior  under  precise  experimental  control." 
Sidman  believes  that  the  adequacy  of  a  technique  in  experimental 
psychology  should  be  evaluated  in  terms  of  the  reliability  and  pre- 
cision of  the  control  it  achieves  over  the  independent  variables. 

According  to  Sidman  (1960),  experiments  are  often  carried  out 
to  test  the  fruitfulness  of  a  new  technique.  Sometimes  the  technique 
is  developed  deliberately  in  order  to  obtain  information  that  could 
not  be  gained  by  standard  methods;  sometimes  the  technique  is  simply 
tried  out  of  curiosity  as  to  the  kind  of  data  it  will  yield.  Technical 
developments  in  experimental  psychology  may  include  improvements  in 
measuring  instruments,  advanced  methods  of  recording  data,  sophisti- 
cated data  analysis,  the  design  of  specialized  apparatus  to  do  a 
particular  job,  or  generalized  apparatus  to  perform  many  functions, 
and  the  extension  of  old  techniques  to  new  areas. 

It  is  clear  that  the  need  for  more  objective  research  in  the 
field  of  clinical  judgment  has  been  articulately  expressed  (Rotter, 
1967).  This  study  represents  a  broad,  but  systematic,  attempt  to 
understand  the  process  of  clinical  judgment.  A  more  fruitful  and 
rigorous  approach  to  this  problem  is  proposed  through  the  systematic 
investigation  of  relevant  variables  as  they  influence  the  judgment 
process.  By  the  application  of  continuous  and  direct  measurement  to 
clinical  judgment,  a  unique  opportunity  exists  to  bring  new  light  into 
this  intriguing  area. 
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Precise  measurement  of  clinical  prediction.  --0.R.  Lindsley  and 
his  associates  have  developed  the  most  powerful  single  tool  to  measure 
human  behavior  -  the  Standard  Behavior  Chart.  This  chart  permits  the 
daily  recording  of  behavior  frequencies  ranging  from  1000  per  minute 
to  one  per  day;  frequencies  ranging  from  one  per  day  to  one  per 
twenty  weeks  may  also  be  recorded  without  changing  the  coordinates  of 
the  chart.  This  chart,  therefore,  provides  us  with  a  standardized 
means  of  depicting  and  analyzing  frequencies  of  virtually  any  human 
behavior  on  a  continuous  basis.  Further,  because  it  is  standardized, 
it  facilitates  the  comparison  of  data  across  different  levels  of 
clinical  information  as  well  as  comparing  different  experience  levels 
and  how  these  variables  effect  clinical  prediction.  Even  more 
importantly,  the  Standard  Behavior  Chart  greatly  facilitates  communi- 
cation of  research  findings  among  scientists. 

According  to  Wolking  and  Schwartz  (1972),  there  are  several 
features  of  the  precise  behavioral  measurement  system  which  makes  it 
distinctive  from  traditional  statistical  measures.  These  are:  (1)-- 
The  basic  unit  of  measurement  is  frequency,  which  is  defined  as  the 
ratio  of  the  number  of  behaviors  emitted  divided  by  the  number  of 
minutes  during  which  the  behavior  has  been  observed.  Thus,  the  in- 
escapable dimension  of  time  is  made  an  integral  part  of  the  basic 
datum.  (2)--A  very  important  feature  of  this  system  is  that  it 
measures  behavior  directly.  Direct  measurement  involves:  defining 
the  behavior  so  precisely  that  it  can  be  counted  with  high  reliability 
(pinpointing);  counting  the  number  of  occurrences  of  the  behavior  and 
the  number  of  minutes  of  observation  and  making  a  permanent  record  of 
both  (recording);  and  finally,  calculating  the  frequency  of  the 
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behavior  observed  by  dividing  the  number  of  movements  by  the  number 
of  minutes.  (3)--There  should  be  continuous  recording  -  that  is,  the 
movement  should  be  measured  daily  or  every  time  the  behaver  engages 
in  the  behavior,  if  it  is  less  than  daily.  (4)--The  most  important 
feature  of  this  system  is  the  graphic  representation  of  the  daily 
rates,  which  provides  a  unique  opportunity  for  rapid  and  accurate 
communication  and  comparison  of  facts  about  behavioral  processes. 


RESEARCH  QUESTIONS 

In  considering  the  present  multifaceted  design,  a  number  of 
research  questions  arise.  The  present  knowledge  in  the  area  of 
clinical  prediction  is  too  limited  and  contradictory  to  permit 
formulation  in  the  context  of  hypotheses.  However,  the  research 
questions  below  are  important  in  formulating  the  analysis  and 
presentation  of  the  data.  These  questions  are: 


1.  How  stable  are  the  daily  predictions  made  by  judges? 
This  is  a  most  relevant  question  which  has  not  been 
researched  in  the  past.  The  present  study,  emphasizing 
continuous  and  direct  recording  of  rate  correct  and  rate 
incorrect  of  predictions,  provides  a  first  step  in 
exploring  this  basic  question  on  a  longitudinal  basis. 

2.  What  effects  will  an  increase  in  available  information 
have  on  clinical  prediction?  Research  in  this  area 
indicates  in  general,  that  as  available  information 
increases,  accuracy  increases.  The  present  design 
provides  a  unique  opportunity  to  measure  this  effect 
directly  on  each  individual  judge,  as  well  as  to 
measure  its  stability  over  time. 
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3.  What  effect  will  experience  level  have  on  clinical 
prediction?  Previous  research  indicates  in  general, 
that  as  experience  level  increases,  accuracy  de- 
creases. The  application  of  the  present  methodology 
provides  a  more  powerful  technique  to  compare  each 
judge  independently. 

4.  How  does  the  time  involved  in  making  a  clinical 
prediction  influence  judgmental  accuracy? 
Researchers  have  not  previously  studied  this 
question.  The  use  of  rate  in  analyzing  judgmental 
accuracy  provides  a  most  sensitive  and  natural 
measure,  for  it  takes  into  consideration  the 
amount  of  time  spent  in  making  a  clinical  judgment. 


METHOD 


Subjects 

To  insure  the  best  control  of  this  relevant  variable,  judges  with 
similar  backgrounds  and  experience  with  diagnostics  were  chosen.     Six 
judges  participated,  representing     three  levels  of  experience.     Level 
1—  Two  first-year  graduate  students   (£1    -  £2);  Level   2--Two  practicum 
students   ( P_  1   -  £2);  Level   3--Two  interns   (£1   -  ]_2).     All   judges 
were  chosen  from  the  University  of  Florida  and  are  currently  enrolled 
as  graduate  students  in  clinical  psychology.     Each  judge  was  his  own 
control . 

Materials 

Test  materials  were  collected  by  Shagoury  (1971).     He  studied 
60  men  imprisoned  in  the  Florida  State  Prison,  Raiford,  Florida. 
Thirty  men,  convicted  of  first-  or  second-degree  murder,  were  compared 
on  certain  genetic,  biological  and  personality  factors  with  thirty  men 
convicted  of  crimes  against  property.     Shagoury   (1971)  collected  bio- 
graphical  information,  MMPI's,  Rorschach's,  EEG's,  and  chromosome 
analyses.     These  materials  were  used  in  the  present  study. 

Procedures 


Refer  to  Table  I  for  a  schematic  of  the  design.  Judges  were  asked 
to  predict  whether  the  material  presented  was  that  of  a  homicide  or 

15 


16 


o 

CJ 

<j 

u 

-C  t- 

-C    •!- 

J=    T- 

_er  •!— 

O  jC 

O  -C 

o  .c 

O  J= 

tO    CL 

re 

re    Q. 

ra 

re   ex 

(O 

res   cl 

jc  re   re 

sz   re  re 

-C    re    re 

j=   re   re 

(_>    i-    +J 

3 

O    S-  +-> 

3 

<_>    S_  +J 

3 

o   i-  +-> 

co    CD  re 

E 

to    CD  re 

E 

CO    CD  re 

to    CD  re 

S-    O  Q 

S- 

D. 

S-    O  Q 

S- 

Q. 

S-    O  Q 

S- 

Q_ 

S-    O  Q 

O  -r- 

o 

O    T- 

o 

O-r- 

o 

-^- 

O   •!- 

a:  ca 

L"- 

S 

od  ca 

u_ 

s:  cc  co 

u_ 

5; 

o:  ca 

s:  -r- 

^T  -r- 

.c  •!- 

.c  t- 

O  J= 

OjC 

O  -C 

o  sz 

re  Q. 

re  o. 

re  a. 

re  a. 

jr  re  re 

.c  re  re 

x:  re  re 

jz  re   re 

O   i-  +-> 

a  s-  +-> 

o  s-  +J 

o   S-  +-> 

to  cd  re 

to  cd  re 

>— <  to   cd  re 

i— <  to  cd  re 

s_  o  Q 

D_ 

5-    o  CO 

Q.    i-    O  Q 

Q_    S-    O  O 

O  -r- 

O-r- 

s:  o  •■- 

a:  co 

S 

C£  CO 

s:  en  ca 

'Zi  en  ca 

7Z   O 

2:  q: 


17 


-C   T- 

.c  1- 

<->  -C 

o  .C 

to  a. 

<a 

(O   a. 

r    «    (0 

-c:  n3   n3 

O    !-   -(-> 

^ 

O    i-  +J 

l/l    CD  03 

<— "    00    CD  ro 

i.  oa 

s- 

D.    i-    OQ 

o-.- 

o 

s:  o  •<- 

a;  cq 

U- 

siac  ca 

•— i  in  cd  as 
a.   s_  o  o 

5:    O-r- 
21  CC  CO 


rO  Q- 

J=  (O    <T3 

C_>  S-  +J 

1— <    (/)  CD  r0 

a.  s-  o  Q 

2:     O  -r- 

2:  c£  co 


1 


13 

non-homicide  (for  instructions,  see  Appendix  A).     Shagoury  (1971) 
found  that  a  discriminant  function  analysis  correctly  classified  83% 
of  the  total   sample.     Following  the  free  operant  methodology  of  con- 
tinuous and  direct  recording,  each  judge  was  asked  to  make  their  pre- 
dictions on  a  daily  basis.     The  research  consisted  of  two  experiments. 

Experiment  I.     --The  purpose  of  this  experiment  was  to  study  the 
stability  of  the  daily  predictions  made  by  judges   using  MMPI's  only. 
This  area  has  received  no  attention  in  the  experimental   literature. 
Operant  methodology,  with  its  unique  feature  of  continuous  and  direct 
recording,  provides  a  most  powerful   tool   to  study  this  phenomenon. 

Sidman   (1960)   defines  a  stable,  or  steady,  state  as  one  in  which 
the  behavior  in  question  does  not  change  its  characteristics  over  a 
period  of  time.     Two  major  types  of  experimental   interest  in  steady- 
state  behavior  have  developed.     One  of  these  may  be  termed  "descriptive" 
and  the  other  "manipulative."     Experiment  I  is  a  purely  descriptive 
study  in  which  a  set  of  experimental   conditions  are  maintained  over  an 
extended  period  of  time,  providing  an  account  of  both  the  stable  and 
the  transitory  aspects  of  the  resulting  behavior.     This  form  of  re- 
search is  fundamental   to  the  establishment  of  behavioral   control 
techniques,  and  of  baselines  from  which  to  measure  behavioral   changes. 
The  data  yielded  by  such  an  experiment  do  not  relate  an  aspect  of 
behavior  to  several   values  of  a  manipulated  independent  variable. 
Rather,  the  resulting  curves  show  some  aspect  of  behavior  as  a  function 
of  time  in  the  experimental   situation.     It  is  the  characteristics  of 
behavior  in  time,   under  a  constant  set  of  maintaining  conditions,  which 
are  of  major  interest.     According  to  Sidman  (1960),  the  descriptive 
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investigation  of  steady-state  behavior  must  precede  any  manipulative 
study.     Manipulation  of  new  variables  will   often  produce  behavioral 
changes,  but,   in  order  to  describe  the  changes,  we  must  be  able  to 
specify  the  baseline  from  which  they  occurred.     Otherwise,  we  face 
insoluble  problems  of  control,  measurement,  and  generality. 

A  major  problem  faced  in  experiments  involving  the  manipulation 
of  steady-states  is  that  of  deciding  whether  the  behavior  in  question 
has  stabilized.     According  to  Sidman  (1960),  there  is  no  assuredly 
final   answer.     He  states  that  "The  utility  of  data  will   depend  not  on 
whether  ultimate  stability  has  been  achieved,  but  rather  on  the  reli- 
ability and  validity  of  the  criterion.     That  is  to  say,  does  the 
criterion  select  a  reproducible  and  generalizable  state  of  behavior? 
If  it  does,  experimental  manipulation  of  steady-states,  as  defined  by 
the  criterion,  will  yield  data  that  are  orderly  and  generalizable  to 
other  situations.     If  the  steady-state  criterion  is  inadequate, 
failures  to  reproduce  and  to  replicate  systematically  the  experimental 
findings  will   reveal   this  fact." 

How  does  one  select  a  steady-state  criterion?     There  is,  according 
to  Sidman   (1960),  no  rule  to  follow,   for  the  criterion  will   depend  up- 
on the  phenomenon  being  investigated  and  upon  the  level  of  experimental 
control   that  can  be  maintained.     Here,  descriptive  long  term  studies 
steady-state  behavior  are  extremely  useful.     By  following  behavior 
over  an  extended  period  of  time,  with  no  change  in  the  experimental 
conditions,   it  is  possible  to  make  an  estimate  of  the  degree  of 
stability  that  can  eventually  be  maintained;  a  criterion  can  then  be 
selected  on  the  basis  of  these  observations.     The  adequacy  of  the 
criterion  chosen  can  be  confirmed  by  the  orderliness  of  the  resulting 
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data.     If  the  steady-state  criterion  yields  orderly  and  repli cable 
functional   relations,   it  may  be  accepted  as  adequate. 

Procedure  for  Experiment  I.     —Judges  were  presented  with  20  MMPI 
profiles,   10  belonging  to  homicides  and  10  to  non-homicides   (base  rate 
of  .5).     They  were  asked  to  discriminate  between  the  profiles  of 
homicides  and  non-homicides.     Each  judge  was  presented  with  the  same 
set  of  profiles  on  a  daily  basis  until  stability  of  prediction  was 
reached.     The  criterion  of  stability  was  orderliness  in  the  data. 

Experiment  II.     --This  experiment  consisted  of  a  systematic 
replication  of  Experiment  I   plus  the  study  of  the  effects  of  adding 
new  information  on  clinical  judgment.     Four  phases  were  involved: 

Phase  1   --  Systematic  replication  of  Experiment  I.     According  to 
Sidman  (1960),   the  soundest  empirical   test  of  the  reliability  of  data 
is  provided  by  replication.     The  application  of  continuous  and  direct 
recording  provided  a  unique  opportunity  to  attempt  to  replicate  the 
findings  of  Experiment  I. 

Phase  2  --  Phase  I  was  used  as  baseline  data  to  study  the  effects 
on  clinical   judgment  of  adding  new  information,   in  this  case  Rorschach 
protocols,   to  MMPI  profiles.     The  research  previously  reviewed  (Gold- 
berg, 1968;  Shagoury  and  Satz,   1969;  Moxley  and  Satz,   1970)  seems  to 
indicate  that  as  more  information  is  available  to  the  clinician, 
judgment  accuracy  increases.     The  present  methodology  provided  a  more 
powerful   technique  to  study  this  phenomena  on  a  daily  basis   instead 
of  the  previous  one  session  studies. 

Phase  3  --  This  phase  was  identical  to  the  previous  phase  except 
that  biographical  and  EEG  data  was  added  to  the  existing  information. 
Phase  2  was  used  as  baseline  data. 
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Phase  4  --  This  phase  was  identical  to  the  two  previous  phases 
except  that  a  summary  of  the  findings  of  a  multivariate  analysis  on 
the  data  as  found  by  Shagoury  (1971)  was  provided  to  each  judge  to 
assist  in  making  his  judgment.  Phase  3  was  used  as  baseline  data. 
Orderliness  of  data  was  the  criterion  used  for  termination  of  this 
phase. 

Procedure  for  Experiment  II.  —Phase  1  —  Judges  were  asked  on 
a  daily  basis  to  predict  homicides  from  non-homicides  using  a  new  set 
of  20  MMPI  profiles  with  base  rate  of  .5.  This  phase  was  discontinued 
when  stability  was  reached.  The  criterion  of  stability  was  orderliness 
in  the  data  as  well  as  comparison  with  stability  in  Experiment  I. 

Phase  2  —  Judges  continued  making  their  daily  predictions.  In 
this  phase,  the  MMPI  profiles  of  Phase  1  and  the  appropriate  Rorschach 
protocols  were  utilized.  Orderliness  of  data  was  the  criterion  for 
stability. 

Phase  3  --  This  phase  was  identical  to  Phase  2,  except  that  judges 
made  their  daily  predictions  with  the  addition  of  biographical  data 
and  EEG  reports. 

Phase  4  --  Judges  continued  making  their  daily  predictions,  but 
this  time  a  summary  of  the  relevant  findings,  as  found  by  a  multivariate 
analysis  on  the  previous  personality,  biographical  and  biological  data, 
was  given  to  each  judge  (Shagoury,  1971). 


RESULTS 

The  measures  used  in  this  study,  frequency  of  correct  predictions 
and  frequency  of  incorrect  predictions,  were  plotted  on  Standard 
Behavior  Charts  (Behavior  Research  Co.).  Plotting  linear  data  on  a 
log  scale  provides  one  with  a  picture  of  proportional  changes  in 
behavior  frequencies  rather  than  absolute  changes  (Koenig,  1972). 
Information  that  the  frequency  of  occurrence  of  a  given  behavior  has 
doubled  or  halved  is  considerably  more  valuable  than  information  that 
the  frequency  of  occurrence  of  the  behavior  has  changed  by  one 
arbitrarily  defined  unit. 

In  order  to  understand  the  present  results,  it  is  necessary  to 
briefly  familiarize  the  reader  with  the  Standard  Behavior  Chart  as 
well  as  the  current  procedures  of  data  analysis. 

Chart  scales.  --The  horizontal  dimension  across  the  bottom  of  the 
chart  represents  calendar  days.  Each  chart  runs  for  140  consecutive 
days  or  20  weeks.  The  vertical  dimension  up  the  left  side  of  the  chart 
is  the  scale  of  frequencies  or  rates.  The  unit  of  measurement  is 
movements  per  minute. 

Record  floor.  --The  record  floor  is  the  lowest  measurable  per- 
formance frequency  other  than  zero.  The  record  floor  is  found  by 
dividing  the  number  of  minutes  in  the  time  sample  into  one,  the  smallest 
number  of  movement  cycles  that  can  be  observed.  The  record  floor  sets 
the  lower  limit  of  the  sensitivity  of  the  chart  as  a  measurement  system 
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for  each  day.  Below  the  floor  is  an  area  of  record  blindness.  The 
symbol  of  the  record  floor  is  a  horizontal  dashed  line  at  the  computed 
level  of  the  floor  for  a  given  day. 

Celeration.  --Few  statistical  measures  are  available  for  des- 
cribing continuous  changes  in  behavior  over  time.  Therefore, 
researchers  interested  in  continuous  observation  and  recording  of 
behavior  have  developed  several  new  measures  for  this  purpose  (Koenig, 
1972).  Frequencies  displayed  on  the  Behavior  Chart  are  usually  either 
accelerating  (x)  or  decelerating  (*)  as  time  passes.  Celeration  is 
the  general  term  for  these  accelerating  and  decelerating  relationships. 
Celeration  is  a  measure  of  change  occurring  in  frequency  of  responding 
over  a  week's  period  of  time.  The  celeration  coefficient  is  function- 
ally related  to  the  slope  of  the  line  of  best  fit  and  is  obtained  by 
using  the  least  squares  method  of  regression.  The  celeration  coef- 
ficients are  the  main  measures  employed  in  the  present  data  analyses. 

The  results  of  the  present  study  are  described  in  terms  of  how 
each  new  phase  change  in  the  experiment  affected  the  daily  clinical 
judgments  of  each  judge.  These  effects  are  discussed  in  terms  of 
accuracy,  efficiency,  step,  growth  and  total  bounce.  Each  of  these 
measures  are  discussed  under  separate  headings  with  the  presentation 
of  the  results.  The  actual  charts  of  the  daily  predictions  for  each 
judge  are  located  in  Appendix  B. 

Accuracy  Ratio  Celeration 

The  accuracy  ratio  is  defined  as  the  ratio  between  frequency 
correct  and  frequency  incorrect.  The  daily  accuracy  ratio  for  each 
judge  was  plotted  on  a  Standard  Behavior  Chart.  A  value  of  one 
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indicates  that  the  frequency  correct  is  equal  to  the  frequency  in- 
correct. A  value  less  than  one  indicates  that  the  frequence  incorrect 
is  higher  than  the  frequency  correct,  and  a  value  greater  than  one 
indicates  that  the  frequency  correct  is  higher  than  the  frequency 
incorrect.  The  reader  might  want  to  convert  these  values  into 
percentage  (e.g.,  x  1.0  =  50%;  x  9.0  =  90%).  The  accuracy  ratio 
celeration  measure  provides  the  opportunity  to  compare  the  celerating 
effects  of  adding  new  information  to  the  clinical  judgment  process  for 
each  judge  as  well  as  across  judges. 

Figures  1  through  6  present  graphically  the  daily  accuracy  ratios 
for  each  judge.  Table  2  shows  a  summary  of  the  accuracy  ratio  celera- 
tions  per  phase  for  each  judge.  Inspection  of  Table  2  shows  that  the 
accuracy  ratio  celeration  coefficients  ranged  from  *  1.56  Movements 
per  minute  per  week  (M/m/w)  to  x  2.27  M/m/w.  Figure  7  shows  a  graphical 
summary  of  the  accuracy  ratio  celerations  across  judges.  Overall,  there 
was  essentially  no  acceleration  or  deceleration  of  accuracy  over  time. 
There  were  four  exceptions.  Figure  3  shows  that  P  -  l's  accuracy 
accelerated  x  2.27  M/m/w  in  Exp  1  (MMPI's  only)  and  x  1.6  M/m/w  in 
Phase  2  (MMPI's  +  Rorschachs).  Figure  6  shows  that  I  -  2's  accuracy 
accelerated  x  1.51  M/m/w  in  Phase  2  (MMPI's  +  Rorschachs)  and  decelerated 
t  1.56  M/m/w  in  Phase  4  (MMPI's  +  Rorschachs  +  Biographical  Data  + 
Formulas). 

Accuracy  Ratio  Frequency  Multiplier 

To  measure  the  effects  of  a  new  procedure  on  the  first  day  of  a 
phase,  the  frequency  multiplier,  or  step,  is  used.  The  frequency 
multiplier  gives  a  measure  of  the  increase  or  decrease  of  frequency 
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correct  or  frequency  incorrect  the  first  day  new  information  is  added 
to  the  clinical  judgment  process.  It  is  a  comparison  of  the  last  data 
point  of  the  old  phase  with  the  first  data  point  of  the  new  phase. 
A  frequency  multiplier  of  x  1.0  Movements  per  minute  per  day  (M/m/d) 
indicates  that  there  has  been  no  increase  or  decrease  in  accuracy  with 
the  introduction  of  new  information  to  the  clinical  judgment  process. 
A  step  of  x  2.0  M/m/d  indicates  that  accuracy  has  doubled  with  the 
introduction  of  a  new  phase.  A  step  of  -  2.0  M/m/d  indicates  that 
accuracy  has  halved  with  the  introduction  of  a  new  phase. 

Figure  8  shows  graphically  the  steps  for  each  judge  as  new  phases 
were  introduced.  The  measures  for  each  phase  from  left  to  right  belong 
to:  F-l;  F-2;  P-l ;  P-2;  1-1 ;  1-2.  Table  3  presents  a  summary  of  the 
accuracy  ratio  frequency  multipliers.  Inspection  of  Table  3  indicates 
that  the  accuracy  ratio  frequency  multipliers  ranged  from  *  3.0  M/m/d 
to  x  4.0  M/m/d.  These  two  measures  belong  to  I  -  2.  Figure  8  indicates 
that  the  maximum  accelerating  steps  were  obtained  with  the  addition  of 
Phase  4  (formulas).  The  addition  of  Phase  2  (Rorschachs)  produced 
overall  the  least  change  in  accuracy  except  for  I  -  2  whose  accuracy 
decreased  •:  3  M/m/d.  The  addition  of  Phase  1  (new  set  of  MMPI's)  as 
well  as  the  addition  of  Phase  3  (biographical  data)  produced  the  most 
momentary  decreases  in  accuracy. 

Record  Floor  Celeration  -  Efficiency 

In  the  present  experiment  the  daily  record  floor  indicates  the 
amount  of  time  a  judge  spent  in  making  his  daily  predictions.  It  is 
therefore,  a  measure  of  efficiency.  Record  floor  celerations  for  each 
judge  are  located  in  Appendix  B.  Table  4  shows  a  summary  of  the  record 
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floor  celerations  per  phase  for  each  judge.  Inspection  of  Table  4 
indicates  that  the  record  floor  celeration  coefficients  ranged  from 
*  1.21  M/m/w  to  x  3.22  M/m/w.  Figure  9  shows  a  graphical  summary  of 
the  record  floor  celerations.  Overall,  there  was  clearly  an  increase 
in  the  efficiency  of  the  judges'  daily  predictions.  The  maximum 
acceleration  in  efficiency  (x  3.22  M/m/w)  was  obtained  for  I  -  2  in 
Phase  4  (MMPI's  +  Rorschachs  +  Biographical  Data  +  Formulas).  The 
maximum  deceleration  of  efficiency  (+  1.21  M/m/w)  was  observed  for 
F  -  1  in  Exp  1  (MMPI's  only). 

Record  Floor  Frequency  Mul tiplier 

To  assess  the  immediate  effects  on  efficiency  of  adding  new 
information  to  the  clinical  judgment  process,  the  record  floor 
frequency  multiplier  for  each  judge  was  computed.  The  frequency 
multiplier  is  a  comparison  of  the  last  record  floor  of  the  old  phase 
with  the  first  record  floor  of  the  new  phase. 

Figure  10  shows  graphically  the  record  floor  steps  for  each  judge 
as  new  phases  are  introduced.  The  measures  for  each  phase  from  left 
to  right  belong  to:  F-l ;  F-2;  P-l ;  P-2;  1-1 ;  1-2.  Table  5  presents 
a  summary  of  the  record  floor  frequency  multipliers.  Inspection  of 
Table  5  indicates  that  the  record  floor  frequency  multipliers  ranged 
from  *  20.8  M/m/d  to  x  1.21  M/m/d.  Figure  10  indicates  that  in  all 
cases  except  one  the  addition  of  new  information  produced  an  immediate 
reduction  in  efficiency.  The  exception  occurred  with  F  -  1  with  the 
addition  of  Phase  3  (biographical  data)  which  produced  an  almost  i 
significant  increment  of  x  1.2  M/m/d.  Inspection  of  Figure  10  i    tes 
that  the  greatest  overall  decrease  in  efficiency  occurred  with  tl 
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addition  of  Phase  2  (Rorschachs).  The  second  most  noticeable  decrease 
in  efficiency  occurred  with  the  addition  of  Phase  4  (formulas). 

Frequency  and  Record  Floor  Growth  Ratio  -  Effectiveness 

The  growth  ratio  is  used  to  assess  the  relationship  between  two 
celerations.  In  the  present  study  we  are  interested  in  the  relation- 
ship between  celeration  correct  and  celeration  record  floor  as  well 
as  celeration  incorrect  and  celeration  record  floor.  In  other  words, 
the  growth  ratio  provides  a  measure  of  the  relationship  between  the 
celeration  of  the  time  spent  in  making  the  daily  predictions  and  the 
celeration  of  the  daily  correct  and  incorrect  frequencies.  The  growth 
ratio  is  independent  of  both  the  initial  frequencies  and  the  two 
celerations.  It  is  therefore,  a  measure  of  effectiveness.  A  growth 
ratio  of  1.00  indicates  that  the  celeration  correct  or  celeration  in- 
correct is  the  same  as  the  celeration  record  floor.  A  growth  ratio 
greater  than  one  indicates  that  the  celeration  correct  or  celeration 
incorrect  is  greater  than  the  celeration  record  floor.  A  growth  ratio 
less  than  one  indicates  that  the  celeration  record  floor  is  greater 
than  the  celeration  frequency  correct  or  incorrect. 

Table  6  shows  the  frequency  correct  and  record  floor  growth  ratios 
per  phase  for  each  judge.  Inspection  of  Table  6  indicates  that  the 
growth  ratios  ranged  from  .85  to  1.48.  Overall,  there  was  essentially 
no  difference  between  correct  celeration  and  record  floor  celeration. 
There  were  four  exceptions.  P  -  1  obtained  a  growth  ratio  of  1.48  in 
Exp  1  (MMPI)  and  1.24  in  Phase  2  (MMPI's  +  Rorschachs).  I  -  2  obtained 
a  growth  ratio  of  1.26  in  Phase  2  (MMPI's  +  Rorschachs)  and  .85  in 
Phase  4  (MMPI's  +  Rorschachs  +  Biographical  Data  +  Formulas). 
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Table  7  shows  the  frequency  incorrect  and  record  floor  growth 
ratios  per  phase  for  each  judge.  Inspection  of  Table  7  indicates  that 
the  growth  ratios  ranged  from  .56  to  1.32.  Overall,  there  was 
essentially  no  difference  between  the  incorrect  celeration  and  record 
floor  celeration.  There  were  five  exceptions.  P  -  1  obtained  a 
growth  ratio  of  .56  in  Exp  1  (MMPI),  .78  in  Phase  1  (MMPI's  only)  and 
.74  in  Phase  2  (MMPI's  +  Rorschachs).  I  -  2  obtained  a  growth  ratio 
of  .84  in  Phase  2  (MMPI's  +  Rorschachs)  and  1.32  in  Phase  4  (MMPI's  + 
Rorschachs  +  Biographical  Data  +  Formulas). 

Accuracy  Ratio  Total  Bounce  -  Variability 

In  order  to  assess  the  variance  around  celeration  lines  on  the 
Behavior  Chart,  Koenig  (1972)  has  developed  the  total  bounce  measure. 
To  find  the  total  bounce,  a  line  is  drawn  parallel  to  the  celeration 
line  through  the  frequency  farthest  above  it.  Then  a  line  is  drawn 
parallel  to  the  celeration  line  through  the  frequency  farthest  below 
it.  The  distance  between  these  two  outer  lines,  expressed  as  a  ratio, 
defines  the  total  bounce  around  the  celeration  line.  Koenig  (1972) 
has  shown  that  the  proportional  variance  around  the  straight  line  of 
celerating  frequencies  usually  remains  constant  regardless  of  the 
value  of  the  frequencies.  Thus,  total  bounce  is  used  as  a  measure  of 
homogeneous  variability  of  the  daily  predictions. 

Table  8  presents  a  summary  of  the  accuracy  ratio  total  bounce  per 
phase  for  each  judge.  Inspection  of  Table  8  reveals  that  the  total 
bounce  ranged  from  x  6.00  to  x  1 .00.  The  highest  total  bounce  of  x  6 
was  obtained  by  I  -  2  with  the  addition  of  Phase  2  (Rorschachs). 
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A  total  bounce  of  x  1.0  indicates  that  there  is  no  variance  around 
the  line  of  best  fit.     P  -  1  obtained  a  total  bounce  measure  of  x  1.0 
in  all  phases  except  Exp  1   (MMPI's).     In  comparison  with  Koenig's 
(1972)  data  1,  the  present  results  indicate  that  the  accuracy  ratio 
total  bounce  for  each  judge  is  considerably  below  the  average.     This 
can  be  taken  as  a  powerful   indication  of  stability  of  accuracy  in 
daily  judgments. 


1.  Koenig  (1972)  investigated  13,941  human  behavior  projects 
deposited  in  the  Behavior  Bank  and  found  that  the  average 
total  bounce  was  x  5.9. 


DISCUSSION 

The  present  study  provided  the  first  experimental  application  of 
continuous  and  direct  recording  of  operant  methodology  to  the  clinical 
judgment  process.  This  novel  application  attempted  to  provide  initial 
answers  to  four  questions:  1)  How  stable  are  the  daily  predictions 
made  by  judges?;  2)  How  does  the  time  involved  in  making  a  clinical 
judgment  influence  accuracy?;  3)  What  is  the  effect  of  an  increase  in 
available  information  on  clinical  prediction?;  4)  What  is  the  effect 
of  experience  level  on  clinical  judgment?  The  present  results  are 
discussed  within  the  framework  of  these  questions. 

The  results  demonstrated  a  number  of  different  effects.  In  some 
cases,  the  effects,  were  consistent  with  previous  research;  for  example, 
the  overall  low  accuracy  level  for  most  judges.  However,  some  of  the 
findings  were  unanticipated,  particularly  the  overall  negligible  effect 
of  adding  information  to  the  clinical  judgment  process.  The  results 
also  demonstrated  a  number  of  new  effects.  These  were:  1)  The  direct 
and  systematic  replication,  across  and  within  judges,  of  the  stability 
of  the  daily  predictions  across  phases;  2)  The  replication  of  the 
increase  in  efficiency,  across  phases,  for  each  judge;  and,  3)  The 
replication  of  the  lack  of  essential  differences  between  celeration 
frequencies  and  celeration  record  floor.  These  findings  provide  a 
starting  point  for  future  research  on  clinical  judgment.  Only  sys- 
tematic and  direct  replication  of  the  present  study  will  provide 
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reliability  and  generality  of  these  results. 
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Stability  of  Daily  Predictions 

It  has  been  eighteen  years  since  Meehl  (1954)  stated  that  "Pre- 
sumably some  kind  of  longitudinal  study  is  needed  to  find  out  whether 
and  to  what  degree  the  'good'  clinician  is  stably  such,  rather  than 
being  merely  the  momentarily  luckiest  fellow  among  a  crew  of  equal  or 
near-equal  mediocre  guessers."  The  present  study  provides  a  partial 
answer.  The  results  indicate  that  in  all  cases  (judges  and  phases) 
except  one,  the  individual  predictions  were  stable.  The  exception 
was  I  -  2,  with  the  addition  of  Phase  2  (Rorschachs).  However,  since 
the  accuracy  level  for  most  judges  across  phases  was  50%,  these  results 
have  to  be  interpreted  with  caution.  A  stable  50%  accuracy  level  is 
easy  to  maintain.  In  our  sample  of  judges,  only  P  -  1  (See  Figure  3) 
maintained  stability  above  50%  accuracy  across  phases.  He  was  the  only 
steady  "good"  clinician  that  could  be  identified.  Inspection  of  each 
chart  indicates  that  F  -  2's  predictions  (See  Figure  2)  in  Phase  4 
(MMPI's  +  Rorschachs  +  Biographical  Data  +  Formulas)  were  stable  above 
50%  accuracy  as  well  as  P  -  2's  (See  Figure  4)  in  Phase  2  (MMPI's  + 
Rorschachs).  These  findings  indicate  that,  in  the  present  sample  of 
judges,  when  a  judge  was  identified  as  "good"  (identified  by  con- 
sistently predicting  correctly  above  chance)  at  least  in  one  phase, 
his  predictions  were  stable  across  that  phase.  Future  longitudinal 
research  should  identify  these  "good"  clinicians  before  attempting 
to  replicate  the  present  findings. 
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Efficiency  of  Daily  Predictions 

The  use  of  frequencies  in  analyzing  judgmental  accuracy  provided 
a  most  sensitive  and  natural  measure  of  efficiency,  for  it  considered 
the  amount  of  time  spent  in  making  a  clinical  judgment.  The  present 
results  showed  that,  overall,  there  was  a  clear  increase  in  the 
efficiency  of  the  judges'  daily  predictions.  It  was  also  shown  that 
efficiency  decreased  when  new  information  was  added  to  the  clinical 
judgment  process.  Inspection  of  Table  10  shows  that  the  maximum 
decrease  in  efficiency  was  obtained  with  the  addition  of  Phase  2 
(Rorschachs).  This  can  be  taken  as  an  indication  that  the  integration 
and  interpretation  of  the  Rorschachs  combined  with  the  MMPI's  required 
the  most  time  and  consequently  the  maximum  drop  efficiency.  It  is 
interesting  to  note  that  the  maximum  decreases  in  efficiency  in  Phase 
2  (MMPI's  +  Rorschachs)  occurred,  in  all  cases  (judges  within  Phase  2) 
except  one,  with  the  medium  (P  -  2)  and  high  experienced  (I  -  1; 
I  -  2)  judges.  These  judges  had  knowledge  in  the  interpretation  of 
the  Rorschach;  therefore,  a  decrease  in  efficiency  is  an  indication 
that  they  were  making  use  of  this  knowledge.  The  second  most  notice- 
able decrease  in  efficiency  occurred  with  the  addition  of  Phase  4 
(formulas).  Once  more,  the  maximum  decrease  in  efficiency  occurred 
with  the  medium  (P  -  1;  P  -  2)  and  high  experienced  (I  -  1 ;  I  -  2) 
judges.  It  seems  like  the  least  experienced  judges  (F  -  1;  F  -  2), 
presented  with  a  novel  set  of  information,  decided  not  to  spend  much 
additional  time  in  attempting  to  integrate  this  new  information. 
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Effectiveness  of  Daily  Predictions 

The  effectiveness  ratio  (growth)  provided  a  measure  of  the 
relationship  between  the  celeration  of  the  time  spent  in  making  the 
daily  predictions  and  the  celeration  of  the  daily  correct  and  in- 
correct frequencies.  The  present  results  indicate  that  there  was 
essentially  no  difference  between  either  the  correct  celeration  and 
record  floor  celeration  (See  Table  6)  or  the  incorrect  celeration  and 
record  floor  celeration  (See  Table  7).  This  indicates  that,  overall, 
most  judges  within  each  phase  expended  less  time  in  making  their 
daily  predictions  as  the  phase  progressed,  but  their  accuracies  were 
uniquely  stable  within  and  across  phases.  That  is,  they  became  more 
efficient  without  a  concomitant  increase  or  decrease  in  accuracy. 
Two  judges  were  the  exception.  P  -  1  (See  Figure  3)  increased  in 
efficiency  and  accuracy  throughout  Exp  1  (MMPI's)  and  throughout 
Phase  2  (MMPI's  +  Rorschachs).  I  -  2  (See  Figure  6)  increased  in 
efficiency  and  accuracy  throughout  Phase  2  (MMPI's  +  Rorschachs)  but 
decreased  in  accuracy  and  increased  in  efficiency  in  Phase  4  (MMPI's 
+  Rorschachs  +  Biographical  Data  +  Formulas). 

Levels  of  Information  Across  Levels  of  Experience  and  Accuracy  of 
Daily  Predictions 

The  accuracy  ratio  celeration  measure  and  the  accuracy  ratio 
frequency  multiplier  provided  the  opportunity  to  compare  the  celerating 
effects  of  adding  new  information  to  the  clinical  judgment  process 
for  each  judge  as  well  as  across  levels  of  experience.  The  present 
results  indicate,  in  general,  that  there  was  essentially  no 
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acceleration  nor  deceleration  of  accuracy  over  time  within  a  phase. 
The  results  also  indicate  that  across  phases,  the  maximum  accelerating 
steps  were  obtained  with  the  addition  of  Phase  4  (formulas).  The 
addition  of  Phase  2  (Rorschachs)  produced  overall  the  least  change 
in  accuracy.  The  addition  of  Phase  1  (new  set  of  MMPI's)  as  well  as 
the  addition  of  Phase  3  (biographical  data)  produced  the  most  initial 
decreases  in  accuracy.  Individual  differences  were  observed  across 
phases  between  judges.  These  individual  differences  are  discussed 
according  to  levels  of  information. 

Exp  1  -  MMPI  's  Only.  —Most  judges  predicted  at  a  50%  accuracy 
level  (Figures  1  through  6).  There  were  two  exceptions,  both  occurring 
with  medium  experienced  judges.  P  -  1  (See  Figure  3)  increased  his 
accuracy  on  the  second  week  in  this  task  to  a  high  of  70%  and  remained 
stable  till  the  end  of  the  phase.  P  -  2  (See  Figure  4)  predicted  con- 
sistently below  chance  and  his  accuracy  did  not  accelerate  across  time. 

Phase  1  -  MMPI's  Only.  —Most  judges  predicted  at  a  50%  accuracy 
level  (Figures  1  through  6).  There  was  one  exception.  P  -  1  (See 
Figure  3)  predicted  at  a  60%  accuracy  level  on  four  of  the  five  days 
in  the  second  week  of  this  phase.  Table  8  indicates  that  the  intro- 
duction of  Phase  1  produced  a  momentary  decrease  in  accuracy  in  both 
of  the  non-experienced  judges  (F  -  1;  F  -  2);  in  one  medium-experienced 
judge  (P  -  1);  and  in  one  high-experienced  judge  (I  -  1).  Since  there 
was  essentially  no  celeration  in  accuracy  in  this  phase  (See  Figure  2), 
these  effects  were  not  permanent. 
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Phase  2  -  MMPI's  +  Rorschachs.     —Three  out  of  six  judges  pre- 
dicted mostly  at  a  50%  accuracy  level    (Figures   1   through  6).     The 
addition  of  Phase  2  (Rorschachs)  had  no  initial   effects   (See  Figure  8) 
nor  celerating  effects   (See  Figures  1  and  2)  on  the  non-experienced 
judges   (P  -  1;  P  -  2).     These  judges  predicted  mostly  at  a  50%  accuracy 
level.     The  addition  of  this  phase  produced  no  initial  effect  in 
p  -  1   (See  Figure  8),  but  his  accuracy  increased  above  50%  on  the 
second  week  in  this  phase.     P  -  2's  accuracy  increased  initially  with 
the  addition  of  this  phase  (See  Figure  8),  and  remained  at  60%  (See 
Figure  4)   for  the  rest  of  the  phase.     The  addition  of  Phase  2  had  no 
initial    (See  Figure  8)  nor  celerating  (See  Figure  5)  effects  on   I   -  1 , 
I  -  2's  judgments  became  unstable  with  the  addition  of  Phase  2   (See 
Table  8).     The  initial  effects  on  I  -  2  was  an  immediate  decrease  in 
accuracy  (See  Figure  8),  but  accuracy  accelerated  (See  Figure  6)  with- 
in the  phase  to  a  terminal   accuracy  of  60%. 

Phase  3  -  MMPI's  +  Rorschachs  +  Biographical    Data.     —Most  judges 
predicted  at  a  50%  accuracy  level    (Figures  1   through  6).     With  the 
non-experienced  judges,  the  addition  of  Biographical    Data  produced 
an  initial   decrease  in  accuracy  for  F  -  2  (See  Figure  8),  but  no 
effects  were  found  for  F  -  1    (See  Figure  8).     The  addition  of  this 
phase  produced  an  initial   decrease  in  accuracy  for  both  of  the  medium- 
experienced  judges.     P  -  l's  predictions   (See  Figure  3)   remained 
stable  at  60%  accuracy  and  P  -  2's  predictions   (See  Figure  4)   remain- 
ed stable  at  50%  accuracy.     The  addition  of  this  phase  had  no  effects 
on  I  -  1    (See  Figure  5),  his  predictions   remained  at  50%.     I  -  2's 
predictions   (See  Figure  6)  decreased  initially  from  60%  to  50%  accuracy 
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with  the  new  phase,  and  remained  stable  at  this  level. 

Phase  4  -  MMPI's  +  Rorschachs  +  Biographical    Data  +  Formulas. 
—The  addition  of  this  phase  produced  overall   the  maximum  increase  in 
accuracy  of  all  phases.     The  addition  of  this  phase  elicited  an  initial 
increase  in  accuracy  for  the  two  non-experienced  judges   (See  Figure  3). 
F  -  Ts  accuracy  (See  Figure  1)   increased  to  a  maximum  of  70%  but 
decelerated  within  this  phase.     F  -  2's  accuracy  increased  initially 
(See  Figure  8)   to  60%  (See  Figure  2)  and  occassional ly  to  70%.     The 
addition  of  Phase  4  (formulas)  produced  the  maximum  increase  in 
accuracy  for  the  non-experienced  judges.     For  P  -  1   (See  Figure  3)  the 
addition  of  the  formulas  produced  neither  an  initial  step  in  accuracy 
(See  Figure  8)  nor  a  celeration  effect.     His  predictions  remained 
stable  at  60%  accuracy.     The  same  occurred  for  P  -  2  (See  Figure  4), 
but  in  this  case  his  predictions  remained  stable  at  50%  accuracy. 
Figure  8  indicates  that  the  addition  of  Phase  4  (formulas)  produced 
no  initial  effect  on  I  -  1  and  an  increase  in  accuracy  for  I  -  2. 
I  -  Ts  accuracy  remained  constant  at  50%  (See  Figure  5).     I  -  2's 
accuracy  decelerated  from  an  initial   70%  accuracy  to  a  terminal   60% 
accuracy  (See  Figure  6). 

To  summarize,  the  addition  of  new  clinical   information  to  the 
judgmental  process  did  not  substantially  increase  accuracy  across 
phases.     The  only  exceptions  were  the  replication  of  the  increase  in 
accuracy  (sometimes  to  70%)  for  both  non-experienced  judges   (F  -  1; 
F  -  2)  with  the  addition  of  Phase  4  (formulas).     Also  with  the  addition 
of  Phase  4  (formulas)   I  -  2's  accuracy  increased  to  a  maximum  of  70% 
with  a  terminal  accuracy  of  60%.     It  is  interesting  to  note  that  the 
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two  non-experienced  judges  increased  in  accuracy  from  50%  to  70%  with 
the  addition  of  Phase  4  (formulas).     Shagoury  (1971)  found  that  these 
formulas  predicted  accurately  83%  of  the  sample  of  homicides.     It 
seems,  from  these  results,  that  non-experienced  judges  (F  -  1;  F  -  2) 
tended  to  ignore  the  actual  test  protocols  and  looked  for  the  relevant 
cues  provided  by  the  formulas.     The  same  could  be  said  for  1-2. 
Nevertheless,  no  judge  approximated  the  overall  accuracy  of  the 
formulas. 

On  the  basis  of  the  preceeding  findings,  a  few  general   comments 
can  be  made.     Some  of  these  comments  may,  at  present,  lack  generality. 
This  study  is  only  a  first  attempt  to  apply  the  single-subject  re- 
search methodology  of  experimental  analysis  to  study  the  clinical 
judgment  process.     Future  replications  of  these  findings  will   provide 
the  final   test  of  the  reliability  and  generality  of  the  present  data. 

The  present  results  are  in  conflict  with  more  traditional  studies 
of  increase  in  levels  of  information.     These  studies   (Shagoury  &  Satz, 
1969;  Moxley  &  Satz,   1970)   found  that  accuracy  increased  as   levels 
of  information  increased.     It  should  be  pointed  out,  however,  that 
the  kind  of  information  presently  used  was  in  part  different  from  the 
two  previous  studies.     In  these  two  studies,  the  information  used  was 
quantitative   (Z  scores;  base  rates;  conditional   probabilities;  etc.), 
and  in  the  present  study,  some  information   (MMPI's;   Rorschachs)  was 
qualitative,  and  some  (biographical   data;   formulas)  was  quantitative. 
It  is  interesting  to  note  that,  in  the  present  study,  Phase  4 
(formulas),  which  was  purely  quantitative  data,  produced  the  maximum 
increase  in  accuracy.     Future  applications  of  experimental  analysis 
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to  clinical  judgment  should  use  quantitative  data  only,  so  that  a 
better  comparison  between  these  studies  can  be  accomplished. 

The  present  results  indicated  that  the  "good"  clinician 
(identified  by  consistently  predicting  correctly  above  chance)  is 
stably  "good"  on  his  clinical  judgment,  and  not  merely  the  "momentari- 
ly luckiest  fellow  among  a  crew  of  equal  or  near-equal  mediocre 
guessers"  as  stated  by  Meehl    (1954).     This  finding  was  replicated 
across  phases  for  P  -  1,  the  only  "good"  clinician  that  could  be 
identified,  as  well   as  within  phases  for  F  -  2  and  P  -  2.     This  find- 
ing is  intriguing  and  warrants  the  need  for  more  longitudinal   studies 
of  "good"  judges. 

A  discouraging  result  was  the  overwhelming  low  accuracy  of  most 
judges  in  the  present  sample.     Shagoury  (1971)  found  that  a  dis- 
criminant function  analysis  discriminated  accurately  83%  of  the  total 
sample.     Most  judges  discriminated  between  homicides  and  non-homicides 
at  50%  accuracy,  with  the  best  judges  reaching  a  ceiling  of  70% 
accuracy.     A  number  of  interpretations  can  be  provided  to  explain 
these  results.     One  is  that  the  random  sample  of  cases  chosen  could 
have  been  the  ones  missed  by  the  discriminant  function,  and,  thus, 
the  most  difficult  to  discriminate.     A  second  possibility  is  that 
special   training  may  be  needed  to  combine  the  available  information 
to  make  an  accurate  discrimination.     This  possibility  is  warranted  by 
the  observation  that  experience  level  had  no  noticeable  effect  on 
judgmental  accuracy.     Future  research  should  test  this  possibility 
by  using  feedback  to  specifically  train  judges  to  discriminate  between 
homicide  and  non-homicide  test  protocols  and  test  the  accuracy  of 
their  predictions  with  a  new  sample.     A  third  and  most  threatening 
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hypothesis,  previously  proposed  by  Meehl  (1956),  is  that  clinical 
judgment  is  not  one  of  the  talents  of  the  clinician  and  therefore  he 
should  relinquish  the  role  of  clinical  judgment  to  the  more  accurate 
computer.  Recent  findings  by  Blumetti  (1972)  provide  the  most 
convincing  argument  against  this  proposition. 

Most  importantly,  the  present  study  brought  a  new  sample  of 
human  behavior,  in  this  case  clinical  judgment,  under  precise  and 
continuous  measurement.  This  was  accomplished  through  the  uniqueness 
of  the  Standard  Behavior  Chart. 
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INSTRUCTIONS 
Phase  I 

This  is  a  research  study  investigating  clinical  judgment.  You 
will  be  presented  with  20  Minnesota  Multiphasic  Personality  Inventory 
(MMPI)  profiles  of  inmates  at  Raiford  State  Prison.  Ten  of  the  twenty 
MMPI  profiles  belong  to  men  convicted  of  first  or  second  degree  murder. 
The  remaining  ten  profiles  belong  to  men  convicted  of  crimes  against 
property;  as  breaking  and  entering,  robbery,  forgery  or  arson,  but 
not  of  any  crimes  against  the  person  (as  assault).  (That  is,  base 
rate  =  .5). 

Your  task  is  to  try  to  discriminate  between  the  MMPI  profiles  as 
to  which  belong  to  the  homicide  group  and  which  do  not.  It  is  possible 
to  correctly  classify  all  the  profiles.  It  is  hoped  that  your  pre- 
diction will  in  some  way  help  us  to  understand  one  aspect  of  the 
decision  making  process  as  it  is  applied  by  psychologists  in  clinical 
settings. 
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INSTRUCTIONS 
Phase  II 

In  this  phase  you  will  be  presented  with  20  cases  of  inmates 
at  Raiford  State  Prison.  Each  case  in  the  folder  has  the  appropriate 
MMPI  and  Rorschach  protocol.  Ten  of  the  20  cases  belong  to  men 
convicted  of  first  or  second  degree  murder.  The  remaining  10  profiles 
belong  to  men  convicted  of  crimes  against  property;  as  breaking  and 
entering,  robbery,  forgery  or  arson,  but  not  of  any  crimes  against 
the  person  Cas  assault).  (That  is,  base  rate  =  .5). 

Your  task  is  to  try  to  discriminate  between  the  cases  as  to  which 
belong  to  the  homicide  group  and  which  do  not.  It  is  possible  to 
correctly  classify  all  the  profiles.  It  is  hoped  that  your  prediction 
will  in  some  way  help  us  to  understand  one  aspect  of  the  decision 
making  process  as  it  is  applied  by  psychologists  in  clinical  settings. 
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INSTRUCTIONS 
Phase  III 

This  phase  is  similar  to  the  previous  phase  except  that  each 
case  in  the  folder  has  the  appropriate  MMPI,  Rorschach  and  biograph- 
ical data.  Ten  of  the  20  cases  belong  to  men  convicted  of  first  or 
second  degree  murder.  The  remaining  10  cases  belong  to  men  convicted 
of  crimes  against  property;  as  breaking  and  entering,  robbery,  forgery 
or  arson,  but  not  of  any  crimes  against  the  person  (as  assault). 
(That  is,  base  rate  =  .5). 

Your  task  is  to  try  to  discriminate  between  the  cases  as  to  which 
belong  to  the  homicide  group  and  which  do  not.  It  is  possible  to 
correctly  classify  all  the  profiles.  It  is  hoped  that  your  pre- 
diction will  in  some  way  help  us  to  understand  one  aspect  of  the 
decision  making  process  as  it  is  applied  by  psychologists  in  clinical 
settings. 
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INSTRUCTIONS 
Phase  IV 

In  the  past  several  weeks  you  have  been  making  decisions  based  on 
MMPI,  Rorschachs  and  biographical  data.  The  purpose  of  the  present 
phase  is  to  provide  you  with  the  optimal  salient  findings  (Shagoury 
and  Satz,  1971)  of  the  statistical  analysis,  performed  on  the  60 
protocols  of  which  the  present  20  is  a  random  sample,  as  found  by  the 
computer.  No  variable  by   itself  was  discriminatory.  However,  when 
the  data  was  subjected  to  a  multivariate  analysis,  the  following 
variables  in  some  combination  (i.e.,  linear)  were  shown  to  correctly 
classify  80%  of  the  sample  (only  7  homicides  and  3  controls  were  mis- 
classified,  yielding  a  valid  positive  rate  of  70%  and  a  false  positive 
rate  of  10%). 

These  are  the  salient  variables  as  found  by  the  computer: 

Variables  Confidence  Value  (T) 

Goldberg  Score  7.51  * 

M  Responses  17.74  * 

Total  Rorschach  Responses  1.70 

Percentage  of  Human  Content  4.46 

Percentage  of  Minus  Responses  20.72  * 

Percentage  of  Whole  Responses  -28.50  * 

Sum  C  -8.60  * 
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Variables         [Continued)  Confidence  Value  (T) 

Total  Pathological  Content  Responses  8.41  * 

I.Q.  -6.33  * 

Grades  Completed  -4.27 

Prior  Felony  Convictions  -4.50 

Prior  Misdemeanor  Convictions  -3.13 

*  T  12,  47  5.44,  p  .05 

Summary  of  Table:     The  homicide  group  showed  a  higher  Goldberg 
score  (+7.51),  more  M  responses   (+17.74),  a  higher  percentage  of  minus 
responses   (+20.72),  a  lower  percentage  of  W  responses   (-28.50),  a 
lower  Sum  C  (-8.60),  more  responses  of  pathological   content  (+8.41), 
and  a  lower  IQ  (-6.33).     No  significant  differences  between  homicide 
and  non-homicide  groups  were  found  with  respect  to  total  number  of 
Rorschach  responses   (1.70),  percentage  of  human  content  (4.46), 
education  level    (-4.27),  and  prior  misdemeanor  or     felony  convictions 
(-3.13;  -4.50). 

Interaction  of  EEG  and  Personality  Variables 

The  following  analysis  was  added  by  Shagoury  and  Satz  after  the 
computer  analysis,  and  suggests  the  possibility  of  a  non-linear  com- 
bination of  data.     Although  considered  altogether  the  abnormal  and 
normal   EEG  groups  showed  no  difference  on  the  personality  measures,  the 
question  of  severity  of  EEG  abnormality  and  personality  disturbance 
remained. 
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There  were  11  cases  of  severe  abnormalities  in  EEG.     Five  of  them 

occurred  in  the  homicide  group  and  six  in  the  non-homicide  group. 

However,  every  homicide  case  with  a_  severely  abnormal    EEG  was 

associated  with  personality  disorganization,  whereas  only  one  case  in 

the  non-homicide  group  with  a  severe  abnormality  in  the  EEG  was 

associated  with  personality  disorganization.     Borderline  abnormalities 

in  EEG  were  not  discriminatory  with  a  trend  toward  more  abnormality  in 

the  EEG  in  the  non-homicide  group. 

The  following  rule  can  be  stated: 

If  the  biographical-personality  variables  point  to 
disorganization  and  the  EEG  is  severely  abnormal, 
consider  higher  probability  of  homicide  behavior. 
However,  if  the  biographical -personality  data  is 
not  disorganized,  and  the  EEG  is  severely  abnormal, 
consider  the  likelihood  of  non -homicide. 

Your  task  at  this   time  is  to  consider  these  variables   (Computer 

and  interaction)  and  make  your  predictions  as  to  which  profiles  are 

homicide  and  which  are  not. 

Goldberg's  Scores  for  MMPI  Profiles 

The  MMPI  data  was  evaluated  for  the  degree  of  personality  dis- 
organization by  means  of  Goldberg's   (1965)   formulation.     The  Gold- 
berg formula  is  a  quantitative  equation  based  upon  the  following 
scales: 

X  =  L  +  Pa  t  Sc  -  (Hy  +  Pt) 

If  the  Goldberg  value  is  high  (X  55)  the  S  is  classified  psychotic, 
and  if  the  Goldberg  value  is  low  (X  35)  the  S^  is  classified  neurotic. 
Intermediate  Goldberg  values  are  considered  indeterminate.  Using  these 
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cut-off  values,  Goldberg  found  a  hit  rate  of  74%,  with  valid  positive 
rate  of  62%  and  false  positive  rate  of  18%.     It  has  proven  to  be  one 
of  the  better  decision  rules  for  differentiating  psychotic  from 
neurotic  profiles. 

Goldberg  scores  for  your  sample  of  20  protocols: 

Protocol   #  Score 


44520 

34 

98737 

47 

56809 

34 

84775 

37 

75658 

31 

15120 

48 

33843 

28 

26335 

52 

16240 

64 

46115 

36 

18386 

88 

35661 

53 

56527 

40 

47398 

51 

88758 

83 

21717 

31 

13763 

42 

83746 

59 

22707 

22 

27998 

48 

APPENDIX  B 


Daily  Correct  and  Incorrect  Frequencies 


67 


J L 


O   O  O    "O 

O   *>  — 


o   CO 

o  >: 

-  < 

Q 

Ll_ 

O 

S- 

o>  m 

o 

< 

o  g 

+-> 

oo  Z 

OJ 

LU 

_   _J 

o 

^S 

O   LJ 

CJ 

(0    > 

13 

co 

cr 

CD 

s- 

o   CO 

lO    Ld 

u_ 

O 

*  lo 

co 

cr: 

o 

Z3 

ro 

CJ 

u_ 

O 

CO 

31DNIIAI    d3d     S±N3lM3A0l/\l 


63 


BiniMIIAI      U3d      S1N31AI3A0W 


69 


3inNIIAl   U3d    S1N31AI3A01AJ 


70 


BlflNIIAI    U3d      S1N3IA13AOI/M 


71 


31HNIIM    U3d     S1N31AI3A0I/M 


72 


J L 


J  V  \ 


o  » 


o 


-O 


CO 

o 

V 

o 

< 

Q 

o 

0) 

<r 

<r 

n 

Q 

on 

<zL 

LJ 

_l 

-a 

< 

a 

LJ 

> 

en 

a 

03 

LO 

hi 

O 

C) 

o 

"1 

sF 

en 

o 

-ro 

O 

CVj 

-O 


r=  cr 


BinNllAl    H3d     S1N31A13A01AI 


73 


J_L 


J_JL 


oo 

09 


O   O 

o 


o 

-CM 


-8 

CO 

>- 

<r 

Q 

O 

(T> 

cr: 

< 

O 

O 

00 

^ 

UJ 

1 

S 

< 
0 

O 

"<0 

UJ 

> 

0 
m 

a) 
en 

UJ 

0 

0 

0 

3 

(/) 

0 

10 

0 

OJ 

3inNIN    U3d     S1N3LM3A01AI 


74 


o 
o 

CO 

CM 

c 

O- 

0> 

CH 

s- 

< 

u- 

o 

Q 

+j 

an 

2 

u 

LlI 

_l 

s_ 

o 

< 

o 

Ld 

c: 

CD 

> 

ZJ 

CO 

aj 

O 

CO 

u_ 

in 

a 

o 

CO 

en 

Qi 

Z3 

o 

O 

ro 

Ll_ 

O 

CM 

31HNIIAJ    U3d     S1N31AI3A01AI 


75 


OO 
O  O 


o  «° 


o 

-  CM 


*>  -9 

X 

.O 
o 


9S 


8u 

en 
in 

UJ 

o 
o 

en 

o 

ro 


CVJ 


3±nNII/\l    cd3d     S1N3W3A01M 


76 


31PINIIA1  d3d    S1N3IAI3A0W 


11 


3inNII/\l    U3d     S1N31M3A01M 


78 


O^ 


o  o 


o 

(/) 

-o 

>- 

< 

Q 

o 

0) 

o: 

< 

n 

Q 

"CO 

^ 

ill 

1 

o 

< 

o 

III 

U) 

> 

CO 

o 

tn 

U) 

in 

o 

o 

o 

CO 

O 

rO 

O 

c\j 

31HNIIM    d3d     S1N31AI3A0IAI 


79 


REFERENCES 


Amnions,   R.3.     Effects  of  knowledge  of  performance:     a  survey  and  ten- 
tative theoretical   formulation.     Journal   of  General   Psychology, 
1956,   54,  279-299. 

Bachrach,  A.J.     Psychological    Research:     an  introduction.     Random 
House,  New  York,   1965. 

Bilodeau,   E.A.   and  Bilodeau,   I.M.     Motor-skills  learning.     Annual 
Review  of  Psychology,  1961,   12,  243-280. 

Blumetti,  Anthony     A  true  test  of  clinical   versus  statistical   prediction, 
Unpublished  Doctoral   dissertation,  University  of  Florida,  1972 

Cole,  J.K.   and  Magnussen,  M.G.     Where  the  action  is.     Journal   of 
Consulting  Psychology,   1966,   30,  539-543. 

Goldberg,   L.R.     The  effectiveness  of  clinicians'   judgments:     the 

diagnosis  of  organic  brain  damage  from  the  Bender-Gestalt  test. 
Journal   of  Consulting  Psychology,  1959,  23,  25-33. 

Goldberg,  L.R.     Diagnosticians  versus  diagnostic  signs:     the  diagnosis 
of  psychosis   versus  neurosis  from  the  MMPI.     Psychological 
Monographs:     General   and  Applied,   1965,   79  (whole  #6Q2T- 

Goldberg,  L.R.     Simple  models  of  simple  processes:     some  research  on 
clinical   judgments.     American  Psychologist,   1968,  23,  483-496. 

Grebstein,  L.     Relative  accuracy  of  actuarial   prediction,  experienced 
clinicians  and  graduate  students   in  a  clinical   judgment  task. 
Journal   of  Consulting  Psychology,   1963,  27,   127-132. 

Hammond,   K.R.,  Hursch,  C.J.   and  Todd,   F.J.     Analyzing  the  components 
of  clinical   inference.     Psychological    Review,   1964,   71,  438-456. 

Hoffman,  P.J.     The  paramorphic  representation  of  clinical   judgment. 
Psychological   Bulletin,   1960,  57,   116-131. 

Holt,  R.R.  Clinical  and  statistical  prediction:  a  reformulation  and 
some  new  data.  Journal  of  Abnormal  and  Social  Psychology,  1958, 
56,   1-12. 


so 


Holt,   R.R.     Yet  another  look  at  clinical   and  statistical   prediction: 
or,   is  clinical   psychology  worthwhile?     American  Psychologist, 
1970,  25,   337-349.  

Hunt,  W.A.,  Wittson,  C.L.   and  Hunt,   E.B.     A  theoretical   and  practical 
analysis  of  the  diagnostic  process.      In  P.H.   Hoch  and  J.   Zubin 
(Eds),  Current  problems   in  psychiatric  diagnosis.     New  York, 
Grune  &  Stratton,   1953,  53-65. 

Hunt,  W.A.,  Arnhoff,   F.N.   and  Cotton,  J.W.     Reliability,   chance  and 
fantasy  in  interjudge  agreement  among  clinicians.     Journal   of 
Clinical   Psychology,   1954,   10,  292-296. 

Hunt,  W.A.   and  Jones,  N.F.     The  experimental   investigation  of  clinical 
judgment.      In  A.J.   Bachrach   (Ed)   Experimental    foundations  of 
Clinical   Psychology,  1962,  26-51. 

Lindsley,  O.R.     Direct  behavioral   analysis  of  psychotherapy  sessions  by 
conjugately  programmed  closed-circuit  television.     Psychotherapy: 
Theory,  Research  and  Practice,  1969,  6,   71-81. 

Little,  K.B.     Research  etiquette  in  the  study  of  clinician's  behavior. 
Journal   of  Consulting  Psychology,   1967,   31,   16-18. 

Luft,  J.     Implicit  hypotheses  and  clinical   predictions.     Journal   of 
Abnormal   and  Social   Psychology,   1950,  45,   756-759. 

Mann,   R.D.     A  critique  of  P.E.   Meehl's  Clinical   versus  Statistical 
Prediction.     Behavioral   Science,   1956,   1,  224-230. 

Meehl,  P.E.     Clinical   versus  Statistical   Prediction.     Minneapolis: 
University  of  Minnesota  Press,  1954. 

Meehl,  P.E.     Wanted  -  a  good  cookbook.     American  Psychologist,  1956, 
11,   263-272. 

Meehl,  P.E.     A  comparison  of  clinicians  with  five  statistical   methods 
of  identifying  psychotic  MMPI  profiles.     Journal   of  Counseling 
Psychology,   1959,  6,   102-109. 

Meehl,  P.E.     The  cognitive  activity  of  the  clinician.     American 
Psychologist,   1960,   15,   19-27. 

Meehl,  P.E.     Seer  over  sign:     the  first  good  example.     Journal   of 
Experimental    Research  in  Personality,   1965,   1,  27-37! 

Moxley,  A.W.     The  effects  of  statistical   information  on  clinical   judg- 
ment.    Unpublished  Doctoral   dissertation,   University  of  Florida, 
1970. 


81 


Moxley,  A.W.  and  Satz,  P.  The  effects  of  statistical  information  on 
clinical  judgment.  Proceedings:  78th  Annual  Convention,  APA, 
1970. 

Oskamp,  S.  Clinical  judgment  from  the  MMPI:  simple  or  complex? 
Journal  of  Clinical  Psychology,  1967,  23,  411-415. 

Payne,  R.W.  Diagnostic  and  personality  testing  in  clinical  psychology. 
American  Journal  of  Psychiatry,  1958,  115,  25-29. 

Perez,  F.I.  The  effects  of  feedback  on  clinical  predictions. 
Unpublished  Master's  Thesis,  University  of  Florida,  1970. 

Perez,  F.I.  and  Satz,  P.  The  effects  of  feedback  on  clinical  predictions 
Proceedings:  79th  Annual  Convention,  APA,  1971. 

Rotter,  J.B.  Can  the  clinican  learn  from  experience?  Journal  of 
Consulting  Psychology,  1967,  31,  12-15. 

Sarbin,  T.R.,  Taft,  R.  and  Bailey,  D.E.  Clinical  inference  and 

cognitive  theory.  New  York:  Holt,  Rinehart  and  Winston,  1960. 

Sawyer,  J.  Measurement  and  prediction,  clinical  and  statistical. 
Psychological  Bulletin^  1966,  66,  178-200. 

Sechrest,  L.,  Gallimore,  R.  and  Hersch,  P.D.  Feedback  and  accuracy  of 
clinical  predictions.  Journal  of  Consulting  Psychology,  1967, 
31,  1-11. a * iyL 

Sidman,  M.  Tactics  of  Scientific  Research.  Basic  Books,  Inc.,  New 
York,  1960.  ~ 

Shagoury,  P.  and  Satz,  P.  The  effect  of  statistical  information  on 
clinical  prediction.  Proceedings:  77th  Annual  Convention,  APA, 
1969,  310-311. 

Shagoury,  P.  An  exploratory  investigation  of  homicidal  behavior. 
Unpublished  Doctoral  dissertation,  University  of  Florida,  1971. 

Taft,  R.  The  ability  to  judge  people.  Psychological  Bulletin,  1955, 
52,  1-28.  

Taft,  R.  Multiple  methods  of  personality  assessment.  Psychological 
Bulletin,  1959,  56,  333-351.  

Ullmann,  L.P.  and  Krasner,  L.  Research  in  behavior  modification. 
Holt,  Rinehart  and  Winston,  New  York,  1965. 

Ulrich,  R.,  Stachnik,  T.  and  Mabry,  J.  Control  of  human  behavior. 
Scott,  Foresman  &  Co.,  Glenview,  Illinois,  1956. 


82 

Ul rich,   R. ,  Stachnik,  T.   and  Mabry,  J.     Control   of  human  behavior. 
Scott,   Pores man  &  Co.,  Glenville,   Illinois,   1370. 

Underwood,  B.J.     Experimental   Psychology.     Appleton-Century-Crofts, 
New  York,   1966. 

Watley,   D.J.     Feedback  training  and  improvement  of  clinical   forecasting. 
Journal   of  Counseling  Psychology,  1968,   15,   167-171. 

Wiggins,   N  and  Hoffman,   P.J.     Three  models  of  clinical   judgment. 
Journal   of  Abnormal   Psychology,   1968,   73,   70-77. 

Wolking,  W.   and  Schwartz,  V.A.     Applied  Behavior  Analysis  and  Learning 
Disorders.      In  P.   Satz  and  J.   Ross   (Eds)  The  Disabled  Learner: 
Early  detection  and  intervention.     Rotterdam,  The  Netherlands: 
University  of  Rotterdam  Tress ,   1 972 ,   In  Press. 


BIOGRAPHICAL  SKETCH 

Francisco  I.   Perez  was  born  in  Havanna,  Cuba,  May  21,  1947. 
He  came  to  the  United  States  in  October,  1960.     He  graduated  from 
Belen  Jesuit  Preparatory  School,  Miami,  Florida,  in  June,  1965,  and 
received  his  Bachelor  of  Arts  in  psychology  from  the  University  of 
Florida  in  June,  1969. 

In  June,  1969,  he  enrolled  in  the  Graduate  School  of  the 
University  of  Florida  where  until   the  present  he  has  pursued  his 
work  toward  degrees  of  Master  of  Arts  and  Doctor  of  Philosophy. 
During  this  period  he  has  held  a  USPHS  Fellowship  and  Second-level 
Veterans  Administration  Traineeship.     He  received  his  Master  of 
Arts  degree  in  psychology  in  December,   1970. 

Currently,  he  is  married  to  the  former  Georgina  M.   Montero. 


82 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it 
conforms  to  acceptable  standards  of  scholarly  presentation  and  is  fully 
adequate,  in  scope  and  quality,  as  a  dissertation  for  the  degree  of 
Doctor  of  Philosophy. 


7 


Satz,  Chairman 


Paul 

Professor  of  Psychology  and  Clinical 
Psychology 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it 
conforms  to  acceptable  standards  of  scholarly  presentation  and  is  fully 
adequate,  in  scope  and  quality,  as  a  dissertation  for  the  degree  of 
Doctor  of  Philosophy. 




Henry  S.  Pennypapker 
Professor  of  Psychology 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it 
conforms  to  acceptable  standards  of  scholarly  presentation  and  is  fully 
adequate,  in  scope  and  quality,  as  a  dissertation  for  the  degree  of 
Doctor  of  Philosophy. 


"Hugh  C.  .Davis 
Professor  of  Psychology  and  Clinical 
Psychology 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it 
conforms  to  acceptable  standards  of  scholarly  presentation  and  is  fully 
adequate,  in  scope  and  quality,  as  a  dissertation  for  the  degree  of 
Doctor  of  Philosophy. 


¥. 


6^UU-u      ft  •  y^At^.;, 
Jaqquel in  R .   Go 1 dman 
Associate  Professor  of  Psychology 
and  Clinical   Psychology 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it 
conforms  to  acceptable  standards  of  scholarly  presentation  and  is 
fully  adequate,  in  scope  and  quality,  as  a  dissertation  for  the  degree 
of  Doctor  of  Philosophy. 


v-as 


lilliam  D.   walking 
Associate  Professor  o^vdducation 


This  dissertation  was  submitted  to  the  Department  of  Psychology 
in  the  College  of  Arts  and  Sciences  and  to  the  Graduate  Council,  and 
was  accepted  as  partial   fulfillment  of  the  requirements  for  the  degree 
of  Doctor  of  Philosophy. 


August,   1972 


Dean,  Graduate  Schoof 


£^ 


